Plant Communications — Latest Matching Preprints

1

AraENCODE: a comprehensive epigenomic database of Arabidopsis Thaliana

Wang, Z.; Liu, M.; Lai, F.; Fu, Q.; Xie, L.; Fang, Y.; Zhou, Q.; Li, G.

2023-06-12 bioinformatics 10.1101/2023.06.10.544382 medRxiv

Top 0.1%

38.7%

Show abstract

Arabidopsis (Arabidopsis thaliana) is a vital model organism in plant biology and genetics. In the last two decades, researchers have made significant progresses in characterizing the chromatin conformation and epigenomic information within the Arabidopsis genome. This information includes but is not limited to the higher structure of chromosomes, histone modification, DNA methylation, and chromatin accessibility. The results of these studies have provided an additional layer of information that complements the DNA sequence data. However, utilizing such knowledge poses a challenge for certain groups that lack bioinformatics analysts or adequate computing resources. A user-friendly and reproducible platform for accessing this information is urgently needed. In this study, we have developed a comprehensive epigenomic database for Arabidopsis (AraENCODE http://glab.hzau.edu.cn/AraENCODE), which comprises a total of 4,511 data libraries, including published chromatin conformation capture datasets (Hi-C, HiChIP), epigenomic datasets (ChIP-Seq, ATAC-Seq, FAIRE-Seq, MNase-Seq, DNase-Seq, BS-seq), and transcriptome data (RNA-Seq, miRNA-Seq). Furthermore, we have incorporated various existing resources, such as single nucleotide polymorphisms (SNPs), cis-regulatory modules, and multi-omics associations. We aim to provide a novel platform for investigating the regulation of epigenetic and chromatin interactions in Arabidopsis in relation to biological processes.

2

Pan-genome and Haplotype Map of Cultivars and Their Wild Ancestors Provides Insights into Adaptive Evolution of Cassava (Manihot esculenta Crantz)

Xia, Z.; Du, Z.; Zhou, X.; Jiang, S.; Zhu, T.; Wang, L.; Chen, F.; Carvalho, L. J. C. B.; Zou, M.; Becerra, L. A.; Zhang, X.; Xu, L.; Wang, Z.; Chen, M.; Wang, S.; Li, M.; Li, Y.; Wang, H.; Liu, S.; Bao, Y.; Zhao, L.; Zhang, C.; Xiao, J.; Guo, F.; Shen, X.; Lu, C.; Qiao, F.; Ceballos, H.; Yan, H.; Zhang, H.; He, S.; Zhao, W.; Wan, Y.; Chen, Y.; Huang, D.; Li, K.; Liu, B.; Peng, M.; Zhang, W.; Muller, B. L.; Chen, X.; Luo, M.-C.; Xiao, J.; Wang, W.

2023-07-03 systems biology 10.1101/2023.07.02.546475 medRxiv

Top 0.1%

32.4%

Show abstract

Cassava is the most important starch sources, a tropical model crop. We constructed nearly T2T genomes of cultivar AM560, wild ancestors FLA4047 and W14, pan-genome of 24 representatives and a clarified evolutionary tree with 486 accessions. Comparison of SVs and SNVs between the ancestors and cultivated cassavas revealed predominant expansion, contraction of genes and gene families. Significantly selective sweeping occurred in the cassava genomes in 122 footprints with 1,519 candidate domestication genes. We identified selective mutations in MeCSK and MeFNR3 promoting photoreaction associated with MeNADP-ME of C4 assimilation in modern cassava. Co-evolved retardation of floral primordia and initiation of storage roots arose from MeCOL5 mutants with altered bindings to MeFT1, MeFT2 and MeTFL2. MebHLHs evolved to regulate the biosynthesis, transport and endogenous remobilization of cyanogenic glucosides, with new functionalities of MeMATE1, MeGTR in selected sweet cassava. These findings enhanced comprehensive knowledge and database on the evolution and breeding of cassava. HIGHLIGHTSO_LIThree nearly T2T cassava genomes of cultivar AM560 and its wild ancestors FLA4047 and W14. C_LIO_LIA species-level cassava panSV haplotype map across 346,322 structural variations over 31,362 gene families and 96,032,008 SNPs and InDels variations globally and a clarified evolutionary tree with 486 accessions. C_LIO_LISelective mutations in MeCSK and MeFNR3 promoted photoreaction associated with MeNADP-ME of C4 assimilation shaped the C3-C4 intermediate photosynthesis of modern cassava. C_LIO_LICoevolution of floral primordia contrary to initiated storage root is pivotal for the domestication of cassava, and arose from MeCOL5 mutants altered the binding with MeFT1, MeFT2 (SP6A), and MeTFL2. C_LI

3

RLKdb: A comprehensive curated receptor-like kinase family database

Yin, Z.; Liu, J.; Dou, D.

2023-12-19 plant biology 10.1101/2023.12.18.572263 medRxiv

Top 0.1%

25.8%

Show abstract

Dear Editor, Since the first plant receptor-like kinase (RLK) gene ZmPK1 was cloned from Zea mays in 1990 (Walker & Zhang, 1990), this large gene family has been extensively studied and shown to play crucial roles in growth, development, and immunity (Tang et al., 2017). RLKs are widespread in the plant kingdom, while the biological functions of most RLKs remain largely elusive (Dievart et al., 2020). Given RLKs share a conserved monophyletic RLK/Pelle kinase domain, RLKs in several model plants are classified into distinct families by extracellular domains (ECDs) (Shiu & Bleecker, 2001). However, independent domain shuffling in specific lineages drives the origin of novel families, which raises a question: how about the landscape of RLKs in the whole plant kingdom? Previously, sequence homology-based methods have been widely used for RLK identification and classification, which probably will miss the distantly related proteins but with similar structures and potential novel families unmentioned in the literature. The academic community urgently requires a dedicated database for a systematic overview of the RLK gene family, providing data support for in-depth research on RLK genes. Here, we used a topology-based method to accurately isolate the RLKomes from proteomes. The obtained RLKomes were further classified into (sub)families based on ECD domains. We constructed a comprehensively curated plant RLK database (https://biotec.njau.edu.cn/rlkdb/), which contains valuable resources for investigating the origin and evolution of the RLK family and multiple online tools for personalized analysis.

4

The Plantae Visualization Platform: a comprehensive web-based tool for the integration, visualization, and analysis of omic data across plant and related species

Santiago, A.; Orduna, L.; Fernandez, J. D.; Vidal, A.; de Martin-Agirre, I.; Lison, P.; Vidal, E.; Navarro-Paya, D.; Matus, J. T.

2024-12-22 bioinformatics 10.1101/2024.12.19.629382 medRxiv

Top 0.1%

18.4%

Show abstract

The increasing availability of omics data from non-model plant species has created a pressing need for centralized, user-friendly platforms that maximize the utility of these datasets in a FAIR manner. Here, we introduce PlantaeViz, a web-based tool designed for the integration, visualization, and analysis of multi-omics data across a wide range of plant species. PlantaeViz offers advanced functionalities, including gene catalogues built from curated literature, transcriptomic meta-analyses presented as gene expression atlases, gene co-expression and regulatory networks with on-the-fly ontology analyses, cistrome visualization and metabolomics-transcriptomics integration, among other tools, providing a robust framework for hypothesis generation and biological interpretation. Gene Cards applications, tailored to each plant species, provide both community-curated and automatically generated functional annotation information. One of the platforms core features is its big data approach: over 58,000 publicly available SRA transcriptomic samples have been processed and visualized to date. Significant efforts have been made in orthology assessment using multiple layers of evidence, as well as in the automatic classification and standardization of omics metadata through regular expressions and data mining. As a result, around 90% of transcriptomic runs have been successfully classified according to sample tissue. These data have been used to construct gene networks via computationally intensive methods based on diverse algorithms. We present a study case to illustrate the platforms integration and exploratory capabilities. PlantaeViz bridges genomics and functional knowledge between model and non-model plant species and aims to expand its species catalogue of species in the future, democratizing access to large-scale plant omics data. Further developments will include the incorporation of additional data types, and the implementation of new tools to further support plant research across diverse biological contexts.

5

Horizontal Transfers Lead to the Birth of Momilactone Biosynthetic Gene Clusters in Grass

Wu, D.; Hu, Y.; Akashi, S.; Nojiri, H.; Ye, C.-Y.; Zhu, Q.-H.; Okada, K.; Fan, L.

2022-01-12 evolutionary biology 10.1101/2022.01.11.475971 medRxiv

Top 0.1%

18.3%

Show abstract

Momilactone A, an important plant labdane-related diterpenoid, functions as a phytoalexin against pathogens and an allelochemical against neighboring plants. The genes involved in biosynthesis of momilactone A are found in clusters, i.e., MABGCs (Momilactone A biosynthetic gene clusters), in the rice and barnyardgrass genomes. How MABGCs originate and evolve is still not clear. Here, we integrated results from comprehensive phylogeny and comparative genomic analyses of the core genes of MABGC-like clusters and MABGCs in 40 monocot plant genomes, providing convincing evidence for the birth and evolution of MABGCs in grass species. The MABGCs found in the PACMAD clade of the core grass lineage (including Panicoideae and Chloridoideae) originated from a MABGC-like cluster in Triticeae (BOP clade) via horizontal gene transfer (HGT) and followed by recruitment of MAS and CYP76L1 genes. The MABGCs in Oryzoideae originated from PACMAD through another HGT event and lost CYP76L1 afterwards. The Oryza MABGC and another Oryza diterpenoid cluster c2BGC are two distinct clusters, with the latter being originated from gene duplication and relocation within Oryzoideae. Further comparison of the expression patterns of the MABGC genes between rice and barnyardgrass in response to pathogen infection and allelopathy provides novel insights into the functional innovation of MABGCs in plants. Our results demonstrate HGT-mediated origination of MABGCs in grass and shed lights into the evolutionary innovation and optimization of plant biosynthetic pathways.

6

SapBase (Sapinaceae Genomic DataBase): a central portal for functional and comparative genomics of Sapindaceae species

Li, J.; Chen, C.; Zeng, Z.; Wu, F.; Feng, J.; Liu, B.; Mai, Y.; Chu, X.; Wei, W.; Li, X.; Liang, Y.; Liu, Y.; Xu, J.; Xia, R.

2022-11-29 genomics 10.1101/2022.11.25.517904 medRxiv

Top 0.1%

18.1%

Show abstract

Sapindaceae is a family of flowering plants, also known as the soapberry family, comprising 141 genera and about 1900 species (Pedro et al., 2010). Most of them are distributed in tropical and subtropical regions, including trees, shrubs, also woody or herbaceous vines. Some are dioecious, while others are monoecious. Many Sapindaceae species possess great economic value; some furnish delicious fruits, like lychee (Litchi chinensis), longan (Dimocarpus longan), rambutan (Nephelium lappaceum); and ackee (Blighia sapida) - the national fruit of Jamaica; some produce abundance secondary metabolites, like saponin from soapberry (Sapindus mukorossi), and seed oil from yellowhorn (Xanthoceras sorbifolium); some yield valuable timber including maple (Acer spp.) and buckeye (Aesculus glabra); and some are of great herbal medicinal value, like balloon-vine (Cardiospermum halicacabum). In the last decade, with the rocketing of next generation sequencing (NGS) and genomic technologies, the full genome sequences of several Sapindaceae plants have been resolved (Lin et al., 2017; Liang et al., 2019; Yang et al., 2019; Zhang et al., 2021; Hu et al., 2022; Xue et al., 2022). Among them, our recent publication of the lychee genome attracted broad attention (Edger, 2022; Hu et al., 2022; Lyu, 2022). Now the post-genome era arrives for Sapindaceae, however, there is no public genomic database available for any Sapindaceae species, let alone an integrative database for the whole Sapindaceae family. A unified data platform is in urgent need to collect, manage and share relevant data resources. Therefore, we integrated our home-brew NGS data with all publicly available data for seven Sapindaceae plants and constructed the Sapinaceae Genomic DataBase, named SapBase (www.sapindaceae.com), in order to provide genomic resources and an online powerful analytic platform for scientific research on Sapinaceae species and comparative studies with other plants.

7

TOTEM: A web TOol for Tissue-EnrichMent analysis on gene lists

Coleto-Alcudia, V.; Lozano-Elena, F.; Vera, G.; Betegon-Putze, I.; Gupta, A.; Efroni, I.; Cano-Delgado, A. I.

2024-04-06 plant biology 10.1101/2024.04.04.588116 medRxiv

Top 0.1%

18.1%

Show abstract

Analysis of spatiotemporal patterns of gene expression is crucial to decode biological systems responses. High-throughput sequencing allows in-depth transcriptome analyses and experimental designs, providing valuable reference expression atlases. Specifically, testing overrepresentations of tissue-specific genes based on these atlases can provide valuable insights; however, such an approach is not accessible to inexperienced users. Here, we introduce TOTEM (TOol for Tissue-EnrichMent), a web tool designed to calculate enrichment values per tissue by identifying tissue-specific genes from an organ/organism of interest given a user gene list. Results are visually represented, and users gene classified. The utility of TOTEM is manifest when using integrated single cell expression atlases, enabling the study of complicated tissues, with the maximum possible resolution. Its effectiveness is validated by the study of BRL3 role in stress specifically from the vascular tissues. Finally, TOTEMs modular design allows for continual integration of new experiments. TOTEM can be freely accessed at: https://totemwebtool.com.

8

Tandemly duplicated MYB genes specifically in the Phaseoleae lineage are functionally diverged in the regulation of anthocyanin biosynthesis

Ma, R.; Huang, W.; Hu, Q.; Tian, G.; An, J.; Fang, T.; Liu, J.; Hou, J.; Zhao, M.; Sun, L.

2023-07-16 evolutionary biology 10.1101/2023.07.15.549139 medRxiv

Top 0.1%

17.8%

Show abstract

Gene duplications have long been recognized as a driving force in the evolution of genes, giving rise to novel functions. The soybean genome is characterized by a large extent of duplicated genes. However, the extent and mechanisms of functional divergence among these duplicated genes in soybean remain poorly understood. In this study, we revealed that tandem duplication of MYB genes, which occurred specifically in the Phaseoleae lineage, exhibited a stronger purifying selection in soybean compared to common bean. To gain insights into the diverse functions of these MYB genes in anthocyanin biosynthesis, we examined the expression, transcriptional activity, metabolite, and evolutionary history of four MYB genes (GmMYBA5, GmMYBA2, GmMYBA1 and Glyma.09g235000), which were presumably generated by tandem duplication in soybean. Our data revealed that Glyma.09g235000 had become a pseudogene, while the remaining three MYB genes exhibited strong transcriptional activation activity and promoted anthocyanin biosynthesis in different soybean tissues. Furthermore, GmMYBA5 produced distinct compounds in Nicotiana benthamiana leaves compared to GmMYBA2 and GmMYBA1 due to variations in their DNA binding domains. The lower expression of anthocyanin related genes in GmMYBA5 resulted in lower levels of anthocyanins compared to GmMYBA2 and GmMYBA1. Metabolomics analysis further demonstrated the diverse and differential downstream metabolites, suggesting their functional divergence in metabolites following gene duplication. Together, our data provided evidence of functional divergence within the MYB gene cluster following tandem duplication, which shed light on the potential evolutionary direction of gene duplications during legume evolution.

9

Chromosome-scale genome assembly of Tinospora sagittata (Oliv.) Gagnep. enhances identifying genes involved in the biosynthesis of jatrorrhizine

Alami, M. M.; Shu, S.; Liu, S.; Ouyang, Z.; Zhang, Y.; Lv, M.; Sang, Y.; Gong, D.; Yang, G.; Feng, S.; Mei, Z.; Xie, D.-Y.; Wang, X.

2023-07-21 genomics 10.1101/2023.07.20.549971 medRxiv

Top 0.1%

17.7%

Show abstract

Tinospora sagittata (Oliv.) Gagnep. is an important medicinal tetraploid plant in the Menispermaceae family. Its tuber, namely "Radix Tinosporae" used in Traditional Chinese Medicine, is rich in medicinal terpenoids and benzylisoquinoline alkaloids (BIAs), To enhance understanding the biosynthesis of medicinal compounds, we, herein, report the assembly of a high quality chromosome-scale genome with both PacBio HiFi and Illumina sequencing technologies. The size of assembled genome was 2.33 Gb consisting of 4070 scaffolds (N50=42.06Mb), of which 92.05% were assigned to 26 pseudochromosomes in A and B sub-genomes. A phylogenetic analysis with the T. sagittata and other 16 plant genomes estimated the evolutionary placement of T. sagittata and its divergence time in Ranunculales. Further genome evolution analysis characterized one round tandem duplication about 1.5 million years ago (MYA) and one whole-genome duplication (WGD) about 86.9 MYA. WGD contributed to the duplication of clade-specific cytochrome P450 gene family in Ranunculales. Moreover, sequencing mining obtained genome-wide genes involved in the biosynthesis of alkaloids and terpenoids. TsA02G014550, one candidate, was functionally characterized to catalyze the formation of (S)-canadine in the jatrorrhizine biosynthetic pathway. Taken together, the assembled genome of T. sagittata provides useful sequences to understand the biosynthesis of jatrorrhizine and other BIAs in plants.

10

PlantMDCS: A code-free, modular toolkit for rapid deployment of plant multi-omics databases

Chen, C.; Liu, Y.; Wang, L.; Sai, J.; Wang, Y.; Yue, W.; Sun, J.; Li, Z.; Wang, F.; Tian, J.; Xu, D.; Fang, Y.

2026-02-11 bioinformatics 10.64898/2026.02.09.704752 medRxiv

Top 0.1%

15.2%

Show abstract

With the rapid accumulation of diverse omics datasets, achieving efficient management and integrative analysis of plant multi-omics data remains a major challenge. Conventional solutions rely on constructing web-based databases, which often demand substantial programming expertise and long-term financial support. To address these limitations, we developed the Plant Multi-omics Database Construction System (PlantMDCS)-a locally deployable, user-friendly, and collaborative platform that unifies database construction and downstream multi-omics analysis within a graphical environment. PlantMDCS adopts a decoupled front-end/back-end architecture. The back end serves as the core engine for data management and computation, and is responsible for the storage, preprocessing, integration, and hierarchical association of multi-omics data. Once initialized, the front end supports the complete research workflow, including data import, querying, integrative analysis and visualization. All operations can be performed without programming, while local resource usage is dominated by disk storage required for user-provided datasets rather than sustained computational overhead. Benchmarking across plant species ranging from Arabidopsis to hexaploid wheat demonstrated that database construction can be completed within minutes, independent of genome size or data complexity. PlantMDCS is designed for local deployment to ensure data security, while allowing multi-user collaboration within local networks and supporting controlled remote access for teams distributed across different regions. Overall, PlantMDCS offers a secure and sustainable framework that integrates data management and analysis within a unified system. This design shifts multi-omics research away from fragmented file-based processing toward persistent, database-driven exploration, thereby enhancing analytical efficiency and reproducibility.

11

Regulation of the aurantio-obtusin accumulation by StTCP4.1-mediated StDA1-StHDR1 module in Senna tora seeds

Liu, S.; Liu, J.; Abozeid, A.; Ying, X.; Dong, J.; Liang, Z.-s.

2024-01-08 molecular biology 10.1101/2024.01.08.574662 medRxiv

Top 0.1%

15.0%

Show abstract

Senna tora (S. tora) is a commonly used Chinese medicinal plant due to the presence of the bioactive compounds anthraquinones in its mature seeds. Seed size is an important factor that affect S. tora yield quantity and quality. However, the mechanism regulating seed size and aurantio-obtusin biosynthesis in S.tora is still unclear. In this study, we identified the ubiquitin receptor StDA1 in S.tora that served as a negative regulator to seed formation and limited seed enlargement. Antisense overexpression of StDA1 led to larger seeds in S. tora and promoted the accumulation of aurantio-obtusin. In contrast, overexpression of StDA1 in S.tora resulted in a relative decrease in aurantio-obtusin accumulation. Moreover, StDA1 can directly bind to StHDR1and regulate its degradation through the 26S proteasome to regulate seed size and aurantio-obtusin accumulation. We also found that the StDA1-StHDR1 module is responsive to the MeJA via StTCP4.1, which in turn affects the accumulation of aurantio-obtusin. Overall, we have identified a protein complex that regulates the accumulation of aurantio-obtusin, StTCP4.1-StDA1-StHDR1, as a potential target for improving S.tora yield quantity and quality.

12

EpiReasoner: An Integrated Artificial Intelligence Framework for Phenotype-to-Genotype Reasoning in Plant Epidermal Development

Zhang, H.; Feng, X.

2026-05-18 plant biology 10.64898/2026.05.13.724792 medRxiv

Top 0.1%

14.8%

Show abstract

Achieving high-throughput and precise phenotypic quantification and imaging modalities of stomatal and epidermal cells across diverse species remains a primary bottleneck in elucidating the mechanisms of stomatal dynamics, epidermal patterning, and environmental adaptation of plants. Here, we developed EpiReasoner, an artificial intelligence framework comprising a vision module, EpiVision, and a knowledge-based reasoning module, EpiBrain, for the quantitative phenotypic analysis and domain-specific knowledge reasoning of stomatal complexes and pavement cells in plants. Operating across bright-field, scanning electron microscopy, and differential interference contrast modalities, EpiVision achieves precise instance segmentation in various monocotyledonous, dicotyledonous, and fern species. Its performance significantly surpasses current state-of-the-art models. Moreover, we defined 23 quantitative indices describing stomatal cell morphology and spatial distribution. For domain-specific tasks such as phenotype prediction, genotype deduction, and molecular mechanism reasoning, EpiBrain demonstrates a human preference rate significantly higher than that of general-purpose large language models, including GPT-5 and Claude Sonnet 4. The application of EpiReasoner to phenotypic data of stomatal density derived from a tomato natural population of 170 accessions successfully identified a major quantitative trait locus on chromosome 8. The candidate gene, SKP1-interaction partner 19L (SKIP19L), encoding an F-box family protein, exhibited severe allele frequency drift during tomato domestication, which is highly consistent with the adaptive trend of reduced stomatal density under artificial selection. EpiReasoner provides a novel paradigm that unifies visual phenomics and knowledge-driven reasoning for the biology of stomata and pavement cells, thereby significantly accelerating scientific discovery in plant science.

13

easyMF: A Web Platform for Matrix Factorization-based Biological Discovery from Large-scale Transcriptome Data

Ma, W.; Chen, S.; Zhai, J.; Qi, Y.; Xie, S.; Song, M.; Ma, C.

2020-12-22 bioinformatics 10.1101/2020.12.21.405563 medRxiv

Top 0.1%

14.7%

Show abstract

With the development of high-throughput experimental technologies, large-scale RNA sequencing (RNA-Seq) data have been and continue to be produced, but have led to challenges in extracting relevant biological knowledge hidden in the produced high-dimensional gene expression matrices. Here, we present easyMF, a user-friendly web platform that aims to facilitate biological discovery from large-scale transcriptome data through matrix factorization (MF). The easyMF platform enables users with little bioinformatics experience to streamline transcriptome analysis from raw reads to gene expression and to decompose expression matrix from thousands of genes to a handful of metagenes. easyMF also offers a series of functional modules for metagene-based exploratory analysis with an emphasis on functional gene discovery. As a modular, containerized and open-source platform, easyMF can be customized to satisfy users specific demands and deployed as a web server for broad applications. easyMF is freely available at https://github.com/cma2015/easyMF. We demonstrated the application of easyMF with four case studies using 940 RNA sequencing datasets from maize (Zea mays L.).

14

SiPLATZ12 transcript factor regulates multiple yield traits and salt tolerance in foxtail millet (Setaria italica)

Wu, C.; Xiao, S.; Wan, Y.; Zhang, L.; Tang, S.; Sui, Y.; Bai, Y.; Wang, Y.; Liu, M.; Fan, J.; Zhang, S.; Huang, J.; Yang, G.; Yan, K.; Diao, X.; Zheng, C.

2022-07-01 molecular biology 10.1101/2022.07.01.498439 medRxiv

Top 0.1%

14.6%

Show abstract

Grain yield and salt tolerance are critical for crop production. However, the genetic and biochemical basis underlying the trade-off of these characters remain poorly described in crops. We show here that SiPLATZ12 transcription factor positively regulates multiple elite yield traits at the expense of salt tolerance in foxtail millet. SiPLATZ12 overexpression increases seed size, panicle length, and stem diameter, while reduces plant height and salt tolerance of foxtail millet. A 9-bp insertion in the SiPLATZ12 promoter has significant effects on the different expression of SiPLATZ12, multiple yield traits, and salt tolerance between foxtail millet and its wild ancestor, green foxtail. Moreover, SiPLATZ12 upregulates the expression of genes involved in seed development, but repressing the transcription of most NHX, SOS, and CBL genes to regulate Na+, K+ and pH homeostasis. Therefore, our results uncover a domesticated site that could be used to improve grain yield and salt tolerance in foxtail millet.

15

Machine learning-based prediction of dynamic height heterosis with pathway biomarkers in rice

Dan, Z.; Chen, Y.; Huang, W.

2025-04-14 systems biology 10.1101/2024.11.09.622823 medRxiv

Top 0.1%

12.9%

Show abstract

The development of robust biomarkers enables accurate prediction of complex phenotypes. However, the dynamic nature of biomarkers is often underestimated since their quantitative changes during development are directly connected to phenotypic transformations, influencing both crop agronomic traits and human diseases. Here, we performed network analysis of untargeted metabolite profiles to investigate height heterosis in rice, which is dynamic that varies during development and is a key determinant of yield heterosis. We found that the levels of pyruvaldehyde were predictive of height heterosis specific at the seedling stage, while 4-hydroxycinnamic acid positively correlated with height heterosis across four developmental stages. We identified metabolic pathways associated with height heterosis and found that metabolomic changes during the elongation stage had a greater impact than those in other stages. Finally, 11 heterosis-associated pathways were developed into metabolomic biomarkers through random forest analysis, successfully predicting height heterosis in an independent population under different growth conditions. This study elucidates the metabolomic landscape of dynamic height heterosis in rice and develops pathway biomarkers for complex phenotypes, demonstrating robustness across diverse populations, environments, and developmental stages.

16

Highly heterozygous Citrus changshan-huyou Y. B. Chang originated from ancient hybridization between mandarin and pummelo and displayed distinct tissue-specific allelic imbalance

JIA, Y.; Zeng, Z.; Luo, Y.; Hu, H.; Lan, L.; Guo, B.; Zhou, P.; Tan, C.; Huang, X.; Qi, T.; Chen, Z.; Yu, Z.; Wang, L.; Xiang, T.; Li, C.

2025-03-26 evolutionary biology 10.1101/2025.03.24.644872 medRxiv

Top 0.1%

12.9%

Show abstract

The genus Citrus is characterized by a reticulate evolutionary history with frequent hybridization, making it an intriguing subject for genome evolution investigation. Citrus changshan Y. B. Chang (Huyou) is a unique landrace first discovered in Zhejiang Province, China with premium fruit quality. The evolutionary origin of Huyou has puzzled local botanists and growers. Here, we sequenced a 120-years-old "ancestral tree" of Huyou using PacBio long read and Hi-C sequencing and assembled 2 high-quality haplotype-resolved genomes HY1 and HY2. Huyou displayed a genome heterozygosity level at 3.07%, among the highest in published citrus genomes. Using a k-mer-based tracing approach, we explicitly resolved that HY1 genome contained 87.8% mandarin, 7.3% pummelo, 0.2% citron origin, whereas HY2 had 85.0% pummelo, 2.9% mandarin, 0.3% citron, implying a hybridization event between mandarin and pummelo. Phylogeny dating showed that HY1 (2.0 Mya) and HY2 (2.18 Mya) had diverged earlier than the split of Citrus clementina and Citrus reticulata, and the split of Citrus grandis and Citrus maxima, respectively. We observed clear chromosomal recombination on chr8 and chr9 in HY1, which may have occurred after the ancestral hybridization. Further transcriptome analyses in 6 tissues revealed a strong allelic dominance of HY2 over HY1 in root tissue and moderately in stem, leaf, flower, and fruits. KEGG enrichment analyses revealed that genes related to antioxidants biosynthesis and lipid metabolisms were most significantly affected by allelic imbalance. This first report of allelic imbalance in citrus species support Huyou as an interesting model to investigate genome evolution following distant hybridization.

17

GENE-FAM: An automated pipeline for mining gene families and its application to MADS-box genes in Cannabis sativa

Ryan, L.; Trubanova, N.; Pender, G.; Melzer, R.; Hughes, G. M.; Schilling, S.

2026-06-15 genomics 10.64898/2026.06.10.731441 medRxiv

Top 0.1%

12.7%

Show abstract

Understanding how gene families evolve can offer great insight into adaptation at the phenotypic and ecological levels. This is particularly true in plants, where transcription factor gene families are often targeted for breeding programs to improve the agronomic traits of economically important crops. While recent advances in next generation sequencing have accelerated the wealth of genomics data, there remains a lack of accessible and reproducible genome mining pipelines tailored for gene family characterisation. Here, we address this gap by developing GENE-FAM, an automated, scalable and open-source pipeline designed to mine and predict gene families based on conserved domains and motifs. To illustrate its application, we apply GENE-FAM to annotate MADS-box transcription factor genes across multiple Cannabis sativa genomes. A comprehensive set of MADS-box genes was identified across three C. sativa cultivars, including both previously annotated and newly predicted genes. Through phylogenetic analyses, we confirm that all type II MADS-box gene subfamilies represented in flowering plants are present in C. sativa. Comparing our annotations with those of Arabidopsis thaliana and Solanum lycopersicum revealed that while most MADS type II families are highly conserved, SEPALLATA-like genes have undergone diversification in C. sativa. Together, these results demonstrate the application of GENE-FAM for genome-wide identification and characterisation of gene families in non-model species, revealing novel insights into MADS-box gene family evolution in C. sativa.

18

Systematic mining and genetic characterization of regulatory factors for wheat spike development

Lin, X.; Xu, Y.; Wang, D.; Yang, Y.; Zhang, X.; Bie, X.; Wang, H.; Jiang, J.; Ding, Y.; Lu, F.; Zhang, X.; Zhang, X.; Fu, X.; Xiao, J.

2022-11-11 plant biology 10.1101/2022.11.11.516122 medRxiv

Top 0.1%

12.3%

Show abstract

The spike architecture of wheat plays a crucial role in determining grain number, making it a key trait to optimize in wheat breeding programs. In this study, through a multi-omic approach, we analyzed the transcriptome and epigenome profiles of the shoot apex at eight developmental stages, revealing coordinated changes in chromatin accessibility and H3K27me3 abundance during the flowering transition. We constructed a core transcriptional regulatory network (TRN) that drives wheat spike formation, and experimentally validated a multi-layer regulatory module involving TaSPL15, TaAGLG1, and TaFUL2. By integrating the TRN with genome-wide association analysis (GWAS), we identified 227 transcription factors (TFs), including 42 with known functions and 185 with unknown functions. Further investigation of 61 novel TFs using multiple homozygous mutant lines uncovered 36 TFs with altered spike architecture or flowering time, such as TaMYC2-A1, TaMYB30-A1, and TaWRKY37-A1. Of particular interest, TaMYB30-A1, downstream and repressed by WFZP, was found to regulate fertile spikelet number. Notably, during the domestication and breeding process in China, the excellent haplotype of TaMYB30-A1 containing a C allele at the WFZP binding site was enriched, leading to improved agronomic traits. Our study presents novel and high-confidence regulators and offers an effective strategy for understanding the genetic basis of wheat spike development, with practical impact for wheat breeding applications.

19

The ABI5-WRKY45-LSU1 axis confers tolerance of Arabidopsis thaliana to cadmium

Wang, J.; li, F.; Zheng, X.; Zhang, Y.; Chen, J.; Lv, G.

2026-04-22 molecular biology 10.64898/2026.04.20.718842 medRxiv

Top 0.1%

11.9%

Show abstract

Abscisic acid (ABA) is involved in Cd tolerance in Arabidopsis, but the underlying mechanisms are unclear. In this study, we revealed that the ABI5-WRKY45-LSU1 axis confers the tolerance of Arabidopsis to Cd stress. Under Cd stress, the biosynthesis of ABA is increased, and the expression of transcription factor ABI5 is upregulated. Accordingly, the abi5-8 mutants show increased Cd sensitivity. ABI5 directly binds the ABRE element in the WRKY45 promoter to activate its transcription. Overexpression of WRKY45 rescues the Cd-hypersensitive phenotype of the abi5-8 mutant, placing WRKY45 downstream of ABI5. Transcriptome analyses identified LSU1 as a potential WRKY45 target. qRT-PCR, DUAL-LUC and EMSA experiments verified that WRKY45 binds the W-box cis-element in the LSU1 promoter to activate its expression. Overexpression of LSU1 enhances Cd tolerance by promoting the biosynthesis of non-protein thiols (NPT), glutathione (GSH), and phytochelatins (PC). Moreover, overexpression of LSU1 suppresses Cd sensitivity in the wrky45 mutant, confirming LSU1 acts downstream of WRKY45. On the other hand, we found that ATP sulfurylase 1 (APS1) interacts with LSU1 based in vitro and in vivo evidences. LSU1 stabilizes APS1, slows its degradation, and enhances APS1 activity, thus leading to increased NPT, GSH, and PC accumulation and improved Cd detoxification. Notably, overexpressing LSU1 did not rescue the Cd sensitivity of the aps1-1 mutant, indicating that LSU1 acts upstream of and depends on APS1. In short, we demonstrated a novel ABI5-WRKY45-LSU1 axis that regulates Cd tolerance through sulfur assimilation and phytochelatin synthesis. HighlightsO_LICadmium stress triggers ABA biosynthesis and ABI5 expression; ABI5 directly binds to ABRE motifs in the WRKY45 promoter and activates its transcription. C_LIO_LIWRKY45 transcriptionally activates LSU1, and LSU1 interacts with APS1 to stabilize it and elevate ATP sulfurylase activity, acting in an APS1-dependent manner. C_LIO_LIThe ABI5-WRKY45-LSU1 module enhances Arabidopsis Cd tolerance by boosting sulfur assimilation and GSH/PC-mediated Cd detoxification, rather than reducing Cd uptake. C_LI

20

Haploid-phased chromosomal telomere to telomere genome assembly of Uncaria rhynchophylla accelerating gene mining on the biosynthesis of medicinal alkaloids

Hu, T.; Duan, L.; Shangguan, L.; Zhao, Q.; Hang, Y.; Li, X.; Yang, N.; Yan, F.; Lv, Q.; Tang, L.; Liu, M.; Qiang, W.; Wang, X.; Wang, X.; Zhang, M.

2024-06-02 genomics 10.1101/2024.05.30.596046 medRxiv

Top 0.1%

11.7%

Show abstract

Recorded traditionary medicinal plants contain important resources, including many natural medicinal alkaloids, for new medicine discovery and development. Uncaria genus is such a woody plant, with a high medicinal value in alkaloids, e.g. (iso)rhynchophylline. Natural alkaloids contents usually vary between germplasms and are affected by the growth environment, which requires a genomics solution to understand the genetic and environmental factors that influence alkaloid production in more detail. Here, we have dissected the haploid-resolved chromosomal T2T genome assembly of Uncaria rhynchophylla with a size of [~]634 Mb and contig N50 of 26 Mb using PacBio HiFi long-reads and Hi-C and anchored the contigs on 22 pairs of confirmed chromosomes. This genome contains 56% repeat sequences and [~]2,9000 protein-encoding genes. U. rhynchophylla diverged from a common ancestor shared with Coffea around 20 million years ago and contains expanded and contracted gene families associated with secondary metabolites and plant-associated defenses. We constructed the pathway for (iso)rhynchophylline biosynthesis with genes mined from the genome and comparative transcriptomes. 53 alkaloids out of 2,578 metabolites were identified in (iso)rhynchophylline biosynthesis, where eight differentially expressed genes were the key for regulating the catalytic steps leading to alkaloid abundance difference between tissues. The chromosome-level genome and pathways of (iso)rhynchophylline constructed in this study provide a genetic basis and guidance for further breeding improvements and the development of pharmaceutical alkaloids.